Metal backend: Add Metal int4 quantization support to Parakeet#17235
Metal backend: Add Metal int4 quantization support to Parakeet#17235manuelcandales merged 57 commits intomainfrom
Conversation
This PR needs a
|
There was a problem hiding this comment.
Pull request overview
This PR implements 4-bit weight quantization support for the Parakeet TDT model on the Metal backend using torchao's MPS API. The changes enable Metal-specific quantization while maintaining existing CUDA quantization workflows.
Changes:
- Added
fpa4w(floating point activation, 4-bit weight) quantization option for Metal backend - Implemented validation to ensure Metal-specific quantization is only used with Metal backend
- Updated CI workflows to test Metal int4 quantization with parakeet-tdt model
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| third-party/ao | Updated torchao submodule to version with Metal quantization support |
| examples/models/parakeet/quantize.py | Added Metal int4 quantization implementation using UIntxWeightOnlyConfig |
| examples/models/parakeet/export_parakeet_tdt.py | Added fpa4w option and validation for Metal backend requirement |
| examples/models/parakeet/README.md | Updated documentation with fpa4w config and Metal quantization example |
| .github/workflows/metal.yml | Added int4 quantization testing for parakeet-tdt model |
| .ci/scripts/export_model_artifact.sh | Added quantized-int4-metal option with backend validation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@manuelcandales In the README.md, do you wanna add to run "EXECUTORCH_BUILD_KERNELS_TORCHAO=1 TORCHAO_BUILD_EXPERIMENTAL_MPS=1 ./install_executorch.sh" as a prequisite step for int4 metal quantization? |
yeah, that's true |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| config = UIntxWeightOnlyConfig( | ||
| group_size=qlinear_group_size, | ||
| bitwidth=4, |
There was a problem hiding this comment.
Update the pin past pytorch/ao#3829, and set
uintx_choose_qparams_algorithm="hqq"
There was a problem hiding this comment.
Could be done in a follow-up PR too
There was a problem hiding this comment.
yes, that's my plan, to do in follow-up PR
This PR adds support for 4-bit weight quantization on the Metal backend for Parakeet TDT model.
Parakeet Export Script (export_parakeet_tdt.py, quantize.py)
Documentation (README.md)
CI Integration (export_model_artifact.sh, metal.yml)
Dependencies